options(timeout = 600) # Increase timeout to 600 seconds
install.packages("INLA", repos=c(getOption("repos"), INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)Geostatistical Modelling using R-INLA
R-INLA
INLA is implemented as an R package called INLA, although this package is also called R-INLA. The package is not available from the main R repository CRAN, but from a specific repository at http://www.r-inla.org. The INLA package is available for Windows, Mac OS X and Linux, and there are a stable and a testing version.
A simple way to install the stable version is shown below. For the testing version, simply replace stable by testing when setting the repository.
The main function in the package is inla(), which is the one used to fit the Bayesian models using the INLA methodology. This function works in a similar way as function glm() (for generalized linear models) or gam() (for generalized additive models). A formula is used to specify the model to be fitted and it can include a mix of fixed and other effects conveniently specified.
Specific (random) effects are specified using the f() function. This includes an index to map the effect to the observations, the type of effect, additional parameters and the priors on the hyperparameters of the effect. When including a random effect in the model, not all of these options need to be specified.
1 Load Packages
library(pacman) Warning: package 'pacman' was built under R version 4.4.1
p_load(sf, rgeos, rgdal, dplyr, tmap, leaflet, tidyverse, raster,lwgeom, corrplot,viridis,mice)Installing package into 'C:/Users/ACER/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
Warning: package 'rgeos' is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
Warning in p_install(package, character.only = TRUE, ...):
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'rgeos'
Installing package into 'C:/Users/ACER/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
Warning: package 'rgdal' is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
Warning in p_install(package, character.only = TRUE, ...):
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'rgdal'
Warning in p_load(sf, rgeos, rgdal, dplyr, tmap, leaflet, tidyverse, raster, : Failed to install/load:
rgeos, rgdal
library(INLA)Warning: package 'INLA' was built under R version 4.4.1
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
This is INLA_24.06.27 built 2024-06-27 02:36:04 UTC.
- See www.r-inla.org/contact-us for how to get help.
- List available models/likelihoods/etc with inla.list.models()
- Use inla.doc(<NAME>) to access documentation
library(readxl)2 Import Polygon
# with mukim
Kel <- st_read("kelantan.shp")Reading layer `kelantan' from data source
`C:\Users\ACER\Desktop\inla\kelantan.shp' using driver `ESRI Shapefile'
Simple feature collection with 66 features and 6 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 371629.6 ymin: 503028.2 xmax: 519479.6 ymax: 690232.8
Projected CRS: Kertau (RSO) / RSO Malaya (m)
plot(Kel)2.1 Insert projected coordinates reference system
st_crs(Kel) <- 3168
head(Kel)Simple feature collection with 6 features and 6 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 476592.2 ymin: 654046.7 xmax: 493907.6 ymax: 682355.5
Projected CRS: Kertau (RSO) / RSO Malaya (m)
NEGERI DAERAH MUKIM LELAKI PEREMPUAN JUM_JANTIN
1 KELANTAN BACHOK BEKLAM 4859 4813 9672
2 KELANTAN BACHOK GUNONG (GUNONG TIMOR) 11100 10884 21984
3 KELANTAN BACHOK MAHLIGAI 4564 4600 9164
4 KELANTAN BACHOK PERUPOK 8777 8614 17391
5 KELANTAN BACHOK MELAWI (REPEK) 9227 8672 17899
6 KELANTAN BACHOK TAWANG (MENTUAN) 14140 13632 27772
geometry
1 POLYGON ((485501.8 669698.8...
2 POLYGON ((487716.5 665649.5...
3 POLYGON ((482744.8 660223.4...
4 POLYGON ((486936.9 677358.5...
5 POLYGON ((490841.7 668783.4...
6 POLYGON ((486910.8 677819.3...
2.2 Obtain centroid of polygon
# obtain centroid of polygon
centroid <- Kel %>% mutate(lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
# Select only woking variable
centroid <- centroid[,c(3,6,8,9)]3 Leptospirosis Data
3.1 Import Case Data
dat <- read_excel ("lepto.xlsx")
names(dat) [1] "Diagnosis" "Tahun Daftar" "Epid Daftar"
[4] "Age" "Alamat semasa/kejadian" "Poskod"
[7] "lat" "long" "Notifikasi Status"
[10] "Race" "Kewarganegaraan" "Gender"
[13] "Nationality" "Klasifikasi Kes" "Cara Pengesanan Kes"
[16] "Jenis Rawatan" "Daerah" "MUKIM"
[19] "Negeri"
# convert raw data into aggregated data by sub-district
aggregate1 <- group_by(dat, MUKIM) %>%
summarize( cases = n() )
head(aggregate1)# A tibble: 6 × 2
MUKIM cases
<chr> <int>
1 ALOR PASIR 11
2 BADANG 5
3 BANGGU 3
4 BATU MELINTANG (BELIMBING) 12
5 BATU MENGKEBANG 89
6 BEKLAM 1
3.2 Merging data between main data and centroid
# merge the data
main.dat <- merge(aggregate1, centroid, by.x = "MUKIM",by.y = "MUKIM")
names(main.dat)[1] "MUKIM" "cases" "JUM_JANTIN" "lon" "lat"
[6] "geometry"
# remove geometry
main.dat <- main.dat[,-c(6)]
# create spatial point using kertau
dat.kertau <- SpatialPoints(main.dat[, c("lon", "lat")], proj4string = CRS("+proj=omerc +no_uoff +lat_0=4 +lonc=102.25 +alpha=323.0257905 +gamma=323.130102361111 +k=0.99984 +x_0=804670.24 +y_0=0 +ellps=evrst69 +units=m +no_defs"))
# convert CRS to WGS84
dat.WGS84 <- spTransform(dat.kertau, CRS("+proj=longlat +datum=WGS84"))
# add longlat variable into main data frame
main.dat[, c("lat", "lon")] <- coordinates(dat.WGS84)
head(main.dat) MUKIM cases JUM_JANTIN lon lat
1 ALOR PASIR 11 10112 6.057231 102.0637
2 BADANG 5 35957 6.184074 102.2579
3 BANGGU 3 23049 6.049841 102.3182
4 BATU MELINTANG (BELIMBING) 12 8456 5.674871 101.7198
5 BATU MENGKEBANG 89 63575 5.551418 102.2425
6 BEKLAM 1 9672 6.035219 102.3482
3.3 Calculate of prevalence by sub-district
# calculate incidence of cases per 100000 population
main.dat <- main.dat %>% mutate(incidence = (cases / JUM_JANTIN) * 100000)4 Describing spatial data
4.1 Map the incidence
main.data.sf.merge <- merge(Kel, main.dat, by.x="MUKIM", by.y="MUKIM")
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(main.data.sf.merge) + tm_borders("grey25", alpha=.2) +
tm_shape(main.data.sf.merge) +
tm_borders("grey25", alpha=.5)+
tm_fill("incidence", palette = "YlOrRd", style="cont", n=7,alpha=.6, fill.title = "", title = "Incidence of Leptospirosis Cases")4.2 Map the incidence by Centroid of Sub-districts
# create 4 color representing prevalence of cases
pal <- colorBin("viridis", bins = c(0, 100, 200, 300, 400))
#map the coordinate
leaflet(main.dat) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addCircles(lng = ~lat, lat = ~lon, color = ~pal(incidence)) %>%
addLegend("bottomright", pal = pal, values = ~ incidence, title = "Incidence", labFormat = labelFormat(digits = 5)) %>%
addScaleBar(position = c("bottomleft"))5 Import environmental covariate
5.1 Get tPolygon Map to crop the
Malaysia <- st_read("mys_admbnda_adm1_unhcr_20210211.shp")Reading layer `mys_admbnda_adm1_unhcr_20210211' from data source
`C:\Users\ACER\Desktop\inla\mys_admbnda_adm1_unhcr_20210211.shp'
using driver `ESRI Shapefile'
Simple feature collection with 16 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS: WGS 84
Kel.nomukim <- subset(Malaysia, ADM1_EN=='Kelantan')Kel.WGS84 <- st_transform(Kel.nomukim, crs = 4326) # convert CRS from Kertau to WGS84 Kel.WGS84Simple feature collection with 1 feature and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 101.3369 ymin: 4.549111 xmax: 102.6678 ymax: 6.245808
Geodetic CRS: WGS 84
ADM1_EN ADM1_PCODE ADM0_EN ADM0_PCODE date validOn validTo
3 Kelantan MY03 Malaysia MY 2020-12-02 2021-02-11 -001-11-30
Shape_Leng Shape_Area geometry
3 7.185704 1.234524 MULTIPOLYGON (((102.174 6.2...
5.2 Altitude
library(geodata)Warning: package 'geodata' was built under R version 4.4.1
Loading required package: terra
Warning: package 'terra' was built under R version 4.4.1
terra 1.7.78
Attaching package: 'terra'
The following object is masked from 'package:tidyr':
extract
r <- elevation_30s(country = "MYS", path = tempdir())5.3 Crop Altitude for Kelantan
altitude.r.crop <- crop(r, Kel.WGS84)
# remove raster value outside polygon
altitude.r.crop <- mask(r, Kel.WGS84)5.4 Plot Altitude for Kelantan
#plot the raster
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(altitude.r.crop) +
tm_raster(style= "quantile", n = 7 , palette = viridis(7, direction = 1) ,alpha = .5,title = "Altitude (in meter)")stars object downsampled to 1710 by 584 cells. See tm_shape manual (argument raster.downsample)
5.5 Add covariate (elevation) into main data sets
main.dat$alt <- raster::extract(r, main.dat[, c("lat", "lon")])$MYS_elv_msk
head(main.dat) MUKIM cases JUM_JANTIN lon lat incidence alt
1 ALOR PASIR 11 10112 6.057231 102.0637 108.78165 10
2 BADANG 5 35957 6.184074 102.2579 13.90550 11
3 BANGGU 3 23049 6.049841 102.3182 13.01575 12
4 BATU MELINTANG (BELIMBING) 12 8456 5.674871 101.7198 141.91107 293
5 BATU MENGKEBANG 89 63575 5.551418 102.2425 139.99214 45
6 BEKLAM 1 9672 6.035219 102.3482 10.33912 6
6 Modeling
Here we specify the model to predict Leptospirosis in Kelantan and detail the steps to fit the model using the SPDE approach and the R-INLA package.
6.1 Mesh construction
# extract the boundary of Kelantan from sf-class object to sp-class
kel.bdry <- as(Kel.WGS84, "Spatial") %>% INLA::inla.sp2segment()# create the mesh and supply the boundary argument
coo <- cbind(main.dat$lat,main.dat$lon)
mesh <- INLA::inla.mesh.2d(boundary = kel.bdry,loc = coo, max.edge = c(0.1, 5), cutoff = 0.01)
# No. of mesh created
mesh$n[1] 1576
plot(mesh)
points(coo, col = "red")6.2 Build SPDE model on the mesh
spde <- INLA::inla.spde2.matern(mesh = mesh, alpha = 2)6.3 Index set
Now we generate the index set for the SPDE model. We do this with the function inla.spde.make.index() where we specify the name of the effect (s) and the number of vertices in the SPDE model (spde$n.spde). This creates a list with vector s equal to 1:spde$n.spde, and vectors s.group and s.repl that have all elements equal to 1s and size given by the number of mesh vertices.
indexs <- INLA::inla.spde.make.index("s", spde$n.spde)
lengths(indexs) s s.group s.repl
1576 1576 1576
6.4 Projection matrix
We need to build a projector matrix A that projects the spatially continuous Gaussian random field at the mesh nodes. The projector matrix A can be built with the inla.spde.make.A() function passing the mesh and the coordinates.
A <- INLA::inla.spde.make.A(mesh = mesh, loc = coo)6.5 Prediction data
Here we specify the locations where we wish to predict the prevalence. We set the prediction locations to the locations of the raster of the covariate elevation. We can get the coordinates of the raster r with the function crds() of the terra package.
library(terra)dp <- terra::crds(altitude.r.crop)
dim(dp)[1] 18150 2
In this example, we use fewer prediction points so the computation is faster. We can lower the resolution of the raster by using the aggregate() function of terra. The arguments of the function are
x: raster object,fact: aggregation factor expressed as number of cells in each direction (horizontally and vertically),fun: function used to aggregate values.
We specify fact = 5 to aggregate 5 cells in each direction, and fun = mean to compute the mean of the cell values.
ra <- aggregate(altitude.r.crop, fact = 5, fun = mean)dp <- terra::crds(ra)
dp <- as.data.frame(dp)
dim(dp)[1] 652 2
We call coop to the matrix of coordinates with the prediction locations. We add to the prediction data dp a column cov with the elevation values for the prediction coordinates.
coop <- terra::crds(ra)
dp$cov <- extract(ra, coop)[, 1]6.6 Projector matrix
We also construct the matrix that projects the spatially continuous Gaussian random field from the prediction locations to the mesh nodes.
Ap <- inla.spde.make.A(mesh = mesh, loc = coop)6.7 Stack data for the estimation and prediction
# stack for estimation stk.e
stk.e <- inla.stack(
tag = "est",
data = list(y = main.dat$cases, numtrials = main.dat$JUM_JANTIN),
A = list(1, A),
effects = list(data.frame(b0 = 1, elevation = main.dat$alt), s = indexs)
)
# stack for prediction stk.p
stk.p <- inla.stack(
tag = "pred",
data = list(y = NA, numtrials = NA),
A = list(1, Ap),
effects = list(data.frame(b0 = 1, elevation = dp$cov),
s = indexs
)
)
# stk.full has stk.e and stk.p
stk.full <- inla.stack(stk.e, stk.p)6.8 Model formula
We specify the model formula by including the response in the left-hand side, and the fixed and random effects in the right-hand side. In the formula, we remove the intercept (adding 0) and add it as a covariate term (adding b0), so all the covariate terms can be captured in the projection matrix.
formula <- y ~ 0 + b0 + elevation + f(s, model = spde)6.9 inla() call
We fit the model by calling inla() and using the default priors in R-INLA. We specify the formula, family, data, and options. In control.predictor we set compute = TRUE to compute the posteriors of the predictions. We set link=1 to compute the fitted values (res$summary.fitted.values and res$marginals.fitted.values) with the same link function as the family specified in the model. We also add control.compute = list(return.marginals.predictor = TRUE) to obtain the marginals.
res <- inla(formula,
family = "binomial", Ntrials = numtrials,
control.family = list(link = "logit"),
data = inla.stack.data(stk.full),
control.predictor = list(compute = TRUE, link = 1,
A = inla.stack.A(stk.full)),
control.compute = list(return.marginals.predictor = TRUE)
)summary(res)Time used:
Pre = 0.473, Running = 2.09, Post = 0.126, Total = 2.68
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
b0 -6.646 1.054 -8.680 -6.697 -4.262 -6.677 0.057
elevation 0.000 0.001 -0.002 0.000 0.002 0.000 0.000
Random effects:
Name Model
s SPDE2 model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Theta1 for s -2.45 0.223 -2.887 -2.46 -2.01 -2.46
Theta2 for s 1.34 0.380 0.568 1.35 2.06 1.38
Marginal log-Likelihood: -222.10
is computed
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
7 Mapping the ouput
Now we map the Leptospirosis prevalence predictions using leaflet. The mean prevalence and lower and upper limits of 95% credible intervals are in the data frame res$summary.fitted.values. The rows of res$summary.fitted.values that correspond to the prediction locations can be obtained by selecting the indices of the stack stk.full that are tagged with tag = "pred". We can obtain these indices by using inla.stack.index() passing stk.full and tag = "pred".
index <- inla.stack.index(stack = stk.full, tag = "pred")$dataWe create vectors with the mean prevalence and lower and upper limits of 95% credible intervals with the values of the columns "mean", "0.025quant" and "0.975quant" and the rows given by index.
prev_mean <- res$summary.fitted.values[index, "mean"]
prev_ll <- res$summary.fitted.values[index, "0.025quant"]
prev_ul <- res$summary.fitted.values[index, "0.975quant"]7.1 Mapping predicted mean incidence using raster
# Create SpatVector object
sv <- terra::vect(coop, atts = data.frame(prev_mean = prev_mean,
prev_ll = prev_ll, prev_ul = prev_ul),
crs = "+proj=longlat +datum=WGS84")
# rasterize
r_prev_mean <- terra::rasterize(
x = sv, y = ra, field = "prev_mean",
fun = mean
)pal <- colorNumeric("viridis", c(0, 1), na.color = "transparent")
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addRasterImage(r_prev_mean, colors = pal, opacity = 0.5) %>%
addLegend("bottomright",
pal = pal,
values = values(r_prev_mean), title = "Incidence"
) %>%
addScaleBar(position = c("bottomleft"))We can follow the same approach to create maps with the lower and upper limits of the prevalece estimates. First we create rasters with the lower and upper limits of the prevalences. Then we make the map with the same palette function we used to plot the mean prevalence.